XML schema

An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, Boolean predicates that the content must satisfy, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity constraints.

There are languages developed specifically to express XML schemas. The Document Type Definition (DTD) language, which is native to the XML specification, is a schema language that is of relatively limited capability, but that also has other uses in XML aside from the expression of schemas. Two more expressive XML schema languages in widespread use are XML Schema (with a capital S) and RELAX NG.

The mechanism for associating an XML document with a schema varies according to the schema language. The association may be achieved via markup within the XML document itself, or via some external means.

Contents

Metamodel

The leader of the original XML team admits that they did not begin with a data model. "In the interests of time, XML 1.0 did not define its own data model" [1] They developed a specification for data structures without themselves first defining their own data model. A number of people "co-operated" by email over a short time period to create the original specification. [2] Eventually the XSD specification was produced and the result has been widely criticised as being "unreadable" [3][4][5] with one commentator going so far as to say: "it is one of the most heavily criticised specifications to come out of the organisation" [6] User surveys also highlight the verbose, complex and difficult language used. [7]

One user notes "One reason the spec is so unreadable is because it exposes the abstract model continuously" [8] and indeed it is not possible to derive the allowed structure of information elements from this abstract model alone without reading the text of the specification which, as has been noted, is rather difficult.

The accompanying diagram is an Entity-Relationship (ER) (though in UML format) metamodel of the information elements of XSD. This model does not describe the components of the abstract data model, this model addresses only the actual information elements themselves. The complexity of the model reflects the complexity of the specification itself.

Capitalization

There is some confusion as to when to use the capitalized spelling "Schema" and when to use the lowercase spelling. The lowercase form is a generic term and may refer to any type of schema, including DTD, XML Schema (aka XSD), RELAX NG, or others, and should always be written using lowercase except when appearing at the start of a sentence. The form "Schema" (capitalized) in common use in the XML community always refers to W3C XML Schema.

Validation

The process of checking to see if an XML document conforms to a schema is called validation, which is separate from XML's core concept of syntactic well-formedness. All XML documents must be well-formed, but it is not required that a document be valid unless the XML parser is "validating," in which case the document is also checked for conformance with its associated schema. DTD-validating parsers are most common, but some support W3C XML Schema or RELAX NG as well.

Documents are only considered valid, if they satisfy the requirements of the schema with which they have been associated. These requirements typically include such constraints as:

Validation of an instance document against a schema can be regarded as a conceptually separate operation from XML parsing. In practice, however, many schema validators are integrated with an XML parser.

XML schema languages

See also

References

External links